Coatnet: Marrying Convolution And Attention For All Data Sizes